Mining Textual Stream with Partial Labeled Instances Using Ensemble Framework

نویسندگان

  • Ge Song
  • Yan Li
  • Chunshan Li
  • Jingjing Chen
  • Yunming Ye
چکیده

Increasing access to large-scale, high-dimensional and non-stationary streams in many real applications has made it necessary to design new dynamic classification algorithms. Most existing approaches for the textual stream classification are able to train the model relying on labeled data. However, only a limited number of instances can be labeled in a real streaming environment since large-scale data appear at a high speed. Therefore, it is useful to make unlabeled instances available for training and updating the ensemble models. In this paper, we present a new ensemble framework with partial labeled instances for learning from the textual stream. A new semi-supervised cluster-based classifier is proposed as the subclassifier in our approach. In order to integrate these sub-classifiers, we propose an adaptive selection method. Empirical evaluation of textual streams reveals that our approach outperforms state-of-the-art stream classification algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance

Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, ...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Naïve Bayes Classification Ensembles to Support Modeling Decisions in Data Stream Mining

Data stream mining is the process of applying data mining methods to a data stream in real-time in order to create descriptive or predictive models. Due to the dynamic nature of data streams, new classes may emerge as a data stream evolves, and the concept being modeled may change with time. This gives rise to the need to continuously make revisions to the predictive model. Revising the predict...

متن کامل

A Semi-supervised Ensemble Approach for Mining Data Streams

There are many challenges in mining data streams, such as infinite length, evolving nature and lack of labeled instances. Accordingly, a semi-supervised ensemble approach for mining data streams is presented in this paper. Data streams are divided into data chunks to deal with the infinite length. An ensemble classification model E is trained with existing labeled data chunks and decision bound...

متن کامل

A Data Intensive Multi-chunk Ensemble Technique to Classify Stream Data Using Map-Reduce Framework

We propose a data intensive and distributed multichunk ensemble classifier based data mining technique to classify data streams. In our approach, we combine r most recent consecutive data chunks with data chunks in the current ensemble and generate a new ensemble using this data for training. By introducing this multi-chunk ensemble technique in a Map-Reduce framework and considering the concep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014